Application-Specific Memory Subsystems
نویسندگان
چکیده
OF THE DISSERTATION Application-Specific Memory Subsystems by Joseph G. Wingbermuehle Doctor of Philosophy in Computer Science Washington University in St. Louis, 2015 Professor Roger D. Chamberlain, Chair The disparity in performance between processors and main memories has led computer architects to incorporate large cache hierarchies in modern computers. These cache hierarchies are designed to be general-purpose in that they strive to provide the best possible performance across a wide range of applications. However, such a memory subsystem does not necessarily provide the best possible performance for a particular application. Although general-purpose memory subsystems are desirable when the work-load is unknown and the memory subsystem must remain fixed, when this is not the case a custom memory subsystem may be beneficial. For example, in an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) designed to run a particular application, a custom memory subsystem optimized for that application would be desirable. In addition, when there are tunable parameters in the memory subsystem, it may make sense to change these parameters depending on the application being run. Such a situation arises today with FPGAs and, to a lesser extent, GPUs, and it is plausible that general-purpose computers will begin to support greater flexibility in the memory subsystem in the future. In this dissertation, we first show that it is possible to create application-specific memory subsystems that provide much better performance than a general-purpose memory subsystem. In addition, we show a way to discover such memory subsystems automatically using a superoptimization technique on memory address traces gathered from applications. This allows one to generate a custom memory subsystem with little effort. xi We next show that our memory subsystem superoptimization technique can be used to optimize for objectives other than performance. As an example, we show that it is possible to reduce the number of writes to the main memory, which can be useful for main memories with limited write durability, such as flash or Phase-Change Memory (PCM). Finally, we show how to superoptimize memory subsystems for streaming applications, which are a class of parallel applications. In particular, we show that, through the use of ScalaPipe, we can author and deploy streaming applications targeting FPGAs with superoptimized memory subsystems. ScalaPipe is a domain-specific language (DSL) embedded in the Scala programming language for generating streaming applications that can be implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we are able to demonstrate actual performance improvements using the superoptimized memory subsystem with applications implemented in hardware.
منابع مشابه
Superoptimizing Memory Subsystems for Multiple Objectives
We consider the automatic determination of application-specific memory subsystems via superoptimization, with the goals of reducing memory access time and of minimizing writes. The latter goal is of concern for memories with limited write endurance. Our subsystems outperform general-purpose memory subsystems in terms of performance, number of writes, or both.
متن کاملUser-Level Management of Kernel Memory
Kernel memory is a resource that must be managed carefully in order to ensure the efficiency and safety of the system. The use of an inappropriate management policy can weaken the isolation between subsystems, lead to suboptimal performance, and even make the kernel vulnerable to denial-of-service attacks. Yet, many existing kernels use only a single built-in policy, which is always a compromis...
متن کاملDissociable neural subsystems underlie visual working memory for abstract categories and specific exemplars.
An ongoing debate concerns whether visual object representations are relatively abstract, relatively specific, both abstract and specific within a unified system, or abstract and specific in separate and dissociable neural subsystems. Most of the evidence for the dissociable subsystems theory has come from experiments that used familiar shapes, and the usage of familiar shapes has allowed for a...
متن کاملAnalysis of Multithreaded Multiprocessors with Distributed Shared Memory
In this paper we propose an analytical model, based on multi-chain closed queuing networks, to evaluate the performance of multithreaded multiprocessors. The queuing network is solved by using approximate Mean Value Analysis. Unlike earlier work which modeled individual subsystems in isolation, our work models processor , memory and network subsystems in an integrated manner. Such an approach b...
متن کاملA Heterogeneous Multiprocessor Architecture for Flexible Media Processing
0740-7475/02/$17.00 © 2002 IEEE July–August 2002 NEW MEDIA APPLICATIONS such as highdefinition digital television, set-top boxes with time-shift functionality, 3D games, video conferencing, and MPEG-4 interactivity have generated a demand for increasingly flexible consumer electronics products. These products are evolving into multifunctional devices that combine a set of media applications. Th...
متن کامل